Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up benchmarks #465

Merged
merged 1 commit into from
May 21, 2018
Merged

Clean up benchmarks #465

merged 1 commit into from
May 21, 2018

Conversation

pitdicker
Copy link
Contributor

This includes the change as discussed for the gen_bool en Bernoulli benchmarks.

Also it turns out almost all our uses of black_box were unnecessary, it is enough to return a variable or some accumulator at the end of the benchmark run.

Results with cargo benchcmp --threshold 2:

 name                             master ns/iter       new ns/iter          diff ns/iter   diff %  speedup 
 distr_standard_alphanumeric      2,430 (1646 MB/s)    1,803 (2218 MB/s)            -627  -25.80%   x 1.35 
 distr_standard_bool              4,379 (228 MB/s)     1,055 (947 MB/s)           -3,324  -75.91%   x 4.15 
 distr_uniform_i64                5,497 (1455 MB/s)    5,384 (1485 MB/s)            -113   -2.06%   x 1.02 
 gen_u32_hc128                    2,498 (1601 MB/s)    2,265 (1766 MB/s)            -233   -9.33%   x 1.10 
 gen_u32_isaac                    3,448 (1160 MB/s)    3,593 (1113 MB/s)             145    4.21%   x 0.96 
 gen_u32_std                      2,223 (1799 MB/s)    2,508 (1594 MB/s)             285   12.82%   x 0.89 
 gen_u64_jitter                   23,335               23,998                        663    2.84%   x 0.97 
 init_chacha                      31                   27                             -4  -12.90%   x 1.15 
 init_hc128                       5,072                4,852                        -220   -4.34%   x 1.05 
 init_isaac                       1,454                1,417                         -37   -2.54%   x 1.03 
 init_jitter                      23,329               24,014                        685    2.94%   x 0.97 
 misc_bernoulli_const             2,234                3,873                       1,639   73.37%   x 0.58 
 misc_bernoulli_var               2,922                6,216                       3,294  112.73%   x 0.47 
 misc_gen_bool_const              2,201                4,221                       2,020   91.78%   x 0.52 
 misc_gen_bool_var                4,469                6,211                       1,742   38.98%   x 0.72 
 misc_sample_slice_ref_10_of_100  158                  167                             9    5.70%   x 0.95 
 thread_rng_u32                   3,197 (1251 MB/s)    3,461 (1155 MB/s)             264    8.26%   x 0.92 
  • The misc_gen_bool and misc_bernoulli benchmarks are clearly different (in part because of using StdRng).
  • Benchmarking distributions producing a bool or char using an accumulator gives more realistic results.
  • Hc128Rng and StdRng produce results somewhere between 1250 MB/s and 1800 MB/s, without recompilation. Still don't know what causes the variance.
  • I didn't dare to remove black_box from the gen_bytes benchmarks, it improves performance a bit more than I think is plausible, but I didn't want to figure out the assembly for those...

let mut accum = 0u32;
for _ in 0..::RAND_BENCH_N {
let x: char = distr.sample(&mut rng);
accum = accum.wrapping_add(x as u32);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is the only difference, I think you could just do the same for booleans, though doesn't really matter

for _ in 0..::RAND_BENCH_N {
black_box(rng.gen_bool(p));
accum ^= rng.gen_bool(p);
p += 0.0001;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusting p seems sensible, but it adds an extra op, making this incomparable with misc_gen_bool_const. Was the old method not okay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although LLVM didn't move the range set-up code out of the inner loop, I see no reason why it couldn't. Changing p makes sure it can not.

Funny thing: using black_box before the for loop is slower than incrementing p each round.

@pitdicker
Copy link
Contributor Author

Updated to fold the char and bool distribution macros back into one.

for _ in 0..::RAND_BENCH_N {
black_box(rng.sample(d));
let d = rand::distributions::Bernoulli::new(p);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I think the point was to check what affect using an unknown p has over using a constant, not to adjust the distribution each loop and do a single call for each instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is also possible to benchmark, but then we should expect the benchmark to give the same result as the const version. Does not seem like something all that useful to measure to me.

@dhardy dhardy merged commit 8f57a13 into rust-random:master May 21, 2018
@pitdicker pitdicker deleted the benchmarks branch May 21, 2018 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants